- 
                Notifications
    You must be signed in to change notification settings 
- Fork 298
fix: pod with a restart policy of Never or OnFailure stuck at 'Progressing' (#15317) #709
base: master
Are you sure you want to change the base?
Conversation
| Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@            Coverage Diff             @@
##           master     #709      +/-   ##
==========================================
- Coverage   54.26%   47.34%   -6.93%     
==========================================
  Files          64       64              
  Lines        6164     6537     +373     
==========================================
- Hits         3345     3095     -250     
- Misses       2549     3187     +638     
+ Partials      270      255      -15     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
| This looks like a good approach to the problem. | 
| The pod manifest needs the following to pass the tests: 
 | 
        
          
                pkg/health/testdata/pod-running-restart-never-with-ignore-annotation.yaml
              
                Fixed
          
            Show fixed
            Hide fixed
        
              
          
                pkg/health/testdata/pod-running-restart-never-with-ignore-annotation.yaml
              
                Fixed
          
            Show fixed
            Hide fixed
        
      | @drewhemm I have made the changes you suggested to get a Quality Gate pass | 
| Cool, looks like the last blocking issue is the commit sign off. | 
444a326    to
    84a1039      
    Compare
  
    | A non-blocking issue has been flagged by SonarQube, probably best to resolve it as follows:  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. left nits.
f6e7045    to
    61ddfd1      
    Compare
  
    | 
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| This needs to be merged to solve lots of subsequent bugs that have been raised. | 
| @christianh814 would you be able to give this PR a review? | 
| @crenshaw-dev would you be able to give this PR a review? Much appreciated! | 
        
          
                pkg/health/health_pod.go
              
                Outdated
          
        
      | ) | ||
|  | ||
| const ( | ||
| AnnotationIgnoreRestartPolicy = "argocd.argoproj.io/ignore-restart-policy" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest putting it in a separate file -
gitops-engine/pkg/sync/common/types.go
Line 14 in 093aef0
| AnnotationSyncWave = "argocd.argoproj.io/sync-wave" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nitishfy Thanks for your comment. I have moved this to types.go as you suggested. Indeed better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs required!
3de2f0b    to
    ffeef76      
    Compare
  
    Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
…sses the checks of the Quality Gate Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
improve code readability Co-authored-by: sivchari <shibuuuu5@gmail.com> Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
Signed-off-by: Roelof Kuijpers <roelof.kuijpers@energyessentials.nl>
| 
 | 
| 
 Have added info to the documentation now! | 
| A workaround is to use the annotation: argocd.argoproj.io/ignore-restart-policy: "true". | ||
| When this annotation is set on the Pod resource, the controller will ignore the `restartPolicy` and consider the Pod *Running* as a valid healthy state. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior, combined with hooks, would be problematic since hook would be Healthy before their completions.
It is good to explain the current behavior. But should be part of the health check documentation, and not the hooks.
| policy := pod.Spec.RestartPolicy | ||
| // If the pod has the AnnotationIgnoreRestartPolicy annotation or its restart policy is Always, | ||
| // then treat it as a long-running pod and check its health status. | ||
| if _, ok := pod.Annotations[common.AnnotationIgnoreRestartPolicy]; ok || policy == corev1.RestartPolicyAlways { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this feature is necessary with the usage of argocd.argoproj.io/ignore-healthcheck: 'true'. The pod will show running, because that is the behavior that is configured in the Pod and we should have a consistent behaviour with Kubernetes.
If a controller is creating resource on which we cannot assume application healthiness reliably, then we should exclude those.



This implementation extends the health condition check for pods.
Previously the assumption was that Pods with restart policy of Never or OnFailure are hooks with a finite life, these were considered as Progressing instead of Healthy. However, this logic does not apply when the pod is managed by an operator (e.g., Flink operator) and therefore has a restart policy of Never.
We introduce a new annotation which existence is checked when the pod is Running, that allows for skipping this logic on restart policy.